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An  Objective  Method  for  Forecasting  Solar  Flares 


I.  INTRODl  cnoN 

This  report  is  a  continuation  of  an  earlier  study  (Hirman  et  al,  1980)  in  which 
multivariate  discriminant  analysis  (MVDA)  is  used  in  a  computer  program  to  pro¬ 
duce  an  objective  daily  solar  flare  forecast.  The  essential  feature  of  the  statistics 
package  is  the  comparison  between  a  number  of  input  parameters  and  a  number  of 
output  classes,  in  which  the  discrimination  between  the  classes  in  terms  of  the 
input  parameters  is  majcimized  by  constructing  appropriate  classification  functions. 
In  the  application  to  flare  prediction,  the  input  parameters  are  daily  solar  param¬ 
eters  for  each  active  region  on  the  solar  disk,  and  the  output  classes  are  the  levels 
of  flare  activity  occurring  the  following  day  within  the  same  active  regions.  We 
have  used  more  than  two  years  of  data,  of  which  approximately  25  percent  has 
been  used  to  derive  the  classification  functions.  The  latter  are  then  extrapolated 
forward  in  time  to  produce  a  true  forecast. 

The  computer  program,  known  as  MMD07M,  was  originally  written  at  TCLA,  ^ 
although  the  particular  version  used  here  was  developed  further  by  Seagraves^  to 

Heceived  for  publication  3  Feb  1981 

1.  Dixon,  W,  J.  (ed.)  (1968)  Biomedical  Computer  Programs,  I’niversity  of 
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2,  Seagraves,  P,  H.  (1972)  UBC  BMD07M  Stepwise  Discriminant  Analysis, 

1  n i v ersity  of  British  Columbia  Computing  C ^t re  Doc u nie n t ation. 


includp  tlio  c’oolpy  ain.1  Lohiips  clussiricatiiMi  proopclurr',  '  anci  thr-  I  .:tc!ipnl)i’urli 
N-l  tochiunuo,  ^  Thr  (‘otiloy  and  l.ohnos  [)ri>c'odurp  cioos  not  aasuino  unit'ortviitx 
o{  variance,  aiul  this  scMuet  lines  I'esults  in  hotter  cl  a:- .'.ificMt  ien  scenes .  'I'lie 
computational  htiialen,  ‘unvever,  is  iticreased  becau.-e  line:  i‘  e  hi '>sificat  itai 
functions  are  not  possible;  insti'ad,  canonical  varialde.-.,  censtraetnd  from  the 
oi’ii^inal  input  j^arameters,  are  used  as  a  t ransfornuit ion  t  ‘  reduce  the  m.atri:-. 
dimension  in  tiie  c]:i>>ificat  ion  formulas.  I'hc'  ( aicfienfirncli  t^a’hni<iue  roi:'.c>\  n.^ 
bias  when  tlie  p  roe  ram  classifies  its  own  data  base, 

.\  complete  description  of  the  matitematios  is  bevond  the  scop#^  of  this  repurl, 
I’he  reacier  may  consult  Anderson  anil  Hao^’  for  refpi'nnce:-,  on  disciaminant 
analysis.  A  discussion  i.if  the  suitability  of  applying  various  st.distical  n-etliods  to 
discrete  input  varialiles  is  cemtai ned  in  \'ecchia  et  al,  ‘  hfie  latter  point  is  of 
oarticulai'  init'rest  iiecause  tlie  work  of  \  ecchia  et  al  uses  the  same  discrete  data 
liaso  as  useti  herein,  to  produce  solar  flare  prcbalnlity  forecasts  usine  (iiscriniin- 
ant  analvsis  (without  the  I'ooley  and  kofines  procedure)  anci  foeistic  reeression 
ana!\  SIS, 

An  important  feature  of  the  pi'esent  study  is  the  comparison  <u'  the  oiyiective, 
computer  forecast  with  a  subjective,  conventional  forecast  nreparecl  during  the 
same  test  period  for  tlie  same  active  regions  on  the  sun,  \\  ithi>ut  such  a  bench¬ 
mark  for  relative  evaluation,  the  presentation  of  any  forecast  method  has  con- 
sitiorabiv  retluci'd  merit, 

2.  DM  V 


Tlie  data  used  herein  were  obtained  from  the  reirion  analysis  program  at  tlie 
a'OAA  Space  Environment  Services  Center  (SESC)  in  l^oulder,  Colorado.  The 
rp£jion  analysis  program  collects  daily  a  variety  of  solar  parameters  for  each 
active  re^^ion  on  the  solar  disk.  It  is  important  to  note  that  there  is  no  attempt 

d,  t'ooley,  W ,  W.  ,  and  f.ohnes,  P.  H.  (19G2)  Multivariate  Procedures  for  the 
Kehavioral  Sciences,  Wiley,  New  York, 

4,  T.achenbruch,  P,  A.,  and  Mickey,  Al.  R.  (1968)  Technometrics,  10:1. 

d,  Anderson,  T.  W,  (1958)  An  Introduction  to  A l^iRti variate  Statistical  Analysis, 
Wiley,  New  York. 

6.  i^ao,  C,  H,  (1974)  Advanced  Statistical  Methods  in  i^iometric  Research, 

Hafner. 

7.  Vecchia,  D.  F,,  Caldwell,  G.  A.,  Tryon,  P.  V.,  and  Jones,  H.  H.  (1980) 

in  Sol.  “Terres.  Pred.  Proc^,  Vol,  3,  R.  F.  Donnelly  (ed. ),  C-76. 
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in  thiii  program  to  ^iolect  tho  inoro  flai’o-pi’oductivr  I’ogions.  'J'hn  paran^rters 
include  radio  and  \-ray  ilata,  but  nH)st  ai*o  derived  from  optical  data  supplied 
by  the  I'SA  !•'/ AU'S  SOON  dy^item.  IdTe  parametejV's  contain  infoinnation  which  llie 
SESC  forecasters  consider  vital  to  the  preparation  of  a  24 -hour  flare  forecast, 
'Phe  present  study  uses  data  foi'  tlie  period  1  January  1977  to  Jl  January  1979, 
containinp^  9099  active -region  days  (records)  that  have  been  checked  for  errors 
and  internal  consistency.  Ilandom  scrutiny,  however,  lias  shown  tliat  errors 
still  remain.  After  several  reassignments  of  parameter  values  and  definitions, 
we  arrived  at  the  form  of  the  data  base  shown  in  'fable  1, 


'fable  1.  SESC  Region  Analysis  Parameters  (Modified) 


PARAMETER 

1.  DATE 

2.  REGION  NCMRER 

3.  REGION'S  ITRST  APPEARANCE  EONGITCDK 

4.  CURRENT  LONGITUDE 

5.  N/S  LATITUDE 

6.  CURRENT  LATITUDE 

7.  CARRINGTON  LONGITUDE 

8.  REGION  AGE 

9.  SPOT  CLASS  1 

A . 

n . 

c . 

D . 

E . 

F . 


ASSIGNED  VALUE 


1 


2 


4 


G 

7 


H 


3 


ral)Io  1.  SFSC'  Analysis  I’aramntPrs  (Modifird)  (continuod) 

10.  SPOT  ('PASS  2 


r/x .  1 


a .  3 

h . 4 

k . . . r> 


11.  SPOT  CLASS  3 


X .  1 

o .  2 

i  .  3 

c  . . 4 

12.  MAGNETIC  CLASS 

No  spots . . . 0 

Alpha .  1 

Beta .  2 

Bet  a -Gamma .  3 

Gamma . . .  4 

Beta-Delta .  5 

Beta-Gamma-Delta .  6 

Gamma-Delta .  7 

13.  MAGNETIC  POLARITY  OF  STRONGEST  FIELD . .  {+/-) 

14.  MAGNETIC  FIELD  STRENGTFI .  (Gauss) 

15.  MAGNETIC  GRADIENTS .  (Gamma/km) 

16.  INTERACTION  WITH  ANOTHER  REGION 

None  . . . .  0 

Spots  of  opposite  polarity  converge  (from  less 
than  two  degrees  apart) .  1 

17.  SUNS  POT  DY  N  A  MICS 

No  spots  or  no  motion . . .  0 

Coalescing  of  spots . . .  1 


Table  1.  SESC  Region  Analy.sis  Parameters  (Modified)  (continued) 


'I'ablp  1.  SESC'  Hogion  Analysis  Paraniotors  (Modifird)  (cont inured) 


PEAC.E  C'OIMPACTNESS 

Non-conipact . 

Compact . 

NEITICAL  LINE  ORIENTATION 

Weak  structure . 

North -soutii  (i  4  5  deg) . 

East -west  {±  4  5  deg) . . 

Hairpin . 

Circular . . . 

RE\  EHSE  PCH.AHrrV 

Normal  polarity . 

Reverse  polai’ity . . 

NEl  THAT..  LINK  COMPLEXITY 

Straight  lino  or  weak  structure  ,  .  ,  , 

1-S  Kinks . 

4-G . 

7-lJ . 

-1  ' . 

NEl  'rUAl.  LINK  CHIANCKS 

No  trend  . 

Becoming  simple . 

Becoming  complex  .  ,  , ,  . . 

BHICHT  POINd^S 

None . . . . . . 

Occurred,  but  not  along  neutral  line 
Occurred  along  neutral  line 
PLAGE  FI.rCT CATIONS 

None . 


0 

1 

0 

1 

5 

4 

0 

1 

0 

1 

•) 

'S 

4 

0 

1 

2 

0 

1 

2 

0 


Occurred 


1 


Table  1.  SESC  Region  Analysis  Parameters  (Modified)  (continued) 


31.  ISOLATED  POLE 

32.  EMERGING  FT.l’X 

None,  or  region  is  new . . . 0 

New  flux  emerges  within  spot  group . . .  1 

New  flux  emerges  near  region  (within  '>  deg) .  2 

33.  ARCH  FILAMENT  SYSTEM 

34.  RADIO  BURST/SWEEP 

None  occurred . 0 

>250  flux  units  at  10  cm .  1 

>1000  flux  units  at  10  cm .  2 

Type  III .  3 

Type  IV . 4 

Type  II  and  IV . . .  5 

I  l^urst . 6 

Major/ complex  10  cm  burst  ......  . . 7 

>1000  flux  units  at  10  cm  plus  a  U  burst,  or 
Type  III  and  I\N  or 

250  flux  units  at  10  cm  plus  Type  III  and  IV . 8 

35.  REGION'S  FIRST  APPEARANCE  (TRANSIT  HISTORY) 

36.  FLARE  HISTORY 

No  flares  have  occurred . . . . 0 

C  class  flares  have  occurred . 1 

M  class  flares  have  occurred  . .  2 

X  class  flares  have  occurred  . . 3 

37.  FLARES  TODAY 

None . 0 

C  class  .  1 

M  class .  2 


X  class 


SB.  iM;( y\\)S  nis'i’( 


\otif*  occurroci . 0  j 

Proton  ovont  occurred .  1  | 

eJround  level  event .  2  j 

SO.  piurroNs  'in>i)Av  I 

None . I)  ; 

c)ccui*i-('d . .  .  .  I  : 

40.  I^FCSION  I-( uHKi'AS  I'S  (SKSCA  i 

I 

Probabilities  lor  each  class  (S'  Hare  (none,  c:,  \i,  .n*  \)  lor  ' 

each  re^'ion,  I'or  the  J4-hour  fieriod  beLUnniUL’  at  0  hv  I  1'  next  ' 

day.  Proton  event  pn.tbabilitie.'^  are  similar Iv  .'taied.  i 


Most  ot  the  parameters  in  i'aOie  i  !iave  ueen  assiimen  discrete  values 
accorciinc  to  categories  whic.h  are  subiectivelv  related  le  increasing  Hare 
activitv,  i’his  subjectivity  is  the  weakest  link  in  an\  sciieaie  utilizing  omective 
procedures  tor  prouucint?  a  forecast  sobMv  from  data,  in  essence,  the  situa¬ 
tion  merely  allows  the  element  of  subjectivity  to  reside  entirely  in  t!ie  nata 
acquisition  process.  i^roi:)ably,  this  situation  is  prefei'aole  to  having  subjecti¬ 
vity  introuuced  also  in  the  forecast  preparation,  d'here  ar^  se\'eral  paran.eters 
(o.  lT.  spot  class,  flare  liistory,  magnetic  class)  for  whicli  assigned  values  are 
oasod  upon  quantitative  studies,  h'ortunately,  (or  perhaps  ther^forei  )  these  paran 
eters  are  among  those  from  which  the  objective  forecast  derives  most  of  its  skill. 

Perhaps  the  most  unfortunate  circumstance  is  that  for  a  large  number  of 
records  one  or  more  parameters  is  missing.  In  the  computer  program,  mis¬ 
sing  data  codes  are  replaced  by  averages  for  the  particular  parameter  in  the 
set  of  records  used  in  deriving  the  classification  functions.  Missing  data,  in 
addition  to  errors,  makes  the  testing  of  objective  techniques  difficult,  espec¬ 
ially  for  determining  the  relative  significance  of  various  parameters.  In 
order  to  portray  some  feeling  for  the  degree  of  representation  in  the  data  base 
we  note  the  following:  for  three  commonly  observed  parameters.  Spot  Class 
2,  '  "Magnetic  Class,  "  and  "Flares  Today,  "  only  5893  of  the  total  6095  records 
contain  all  three;  if  "Bright  Points,  "  "Spot  Class  3,  "  "Spot  Class  1,  "  "Magnetic 
Gradients,  "  and  "Sunspot  Dynamics"  are  added  to  the  first  three,  only  3732 
records  remain;  and  for  a  total  of  15  of  the  31  usable  parameters,  only  510 
records  contain  all  15.  This  is,  indeed,  a  hardship  for  statistical  analysis. 
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Novortliolpsrf,  wo  aro  a  bio  to  .show  I  at  or  that  at  I  oast  sonu*  ol*  thoso  fro(}uontly 
missing  paramotors  contain  valuablo  prodictivo  information, 

Tho  data  base  contains  daily,  region -by -rogion  ontrios  for  tho  actual  flaro 
activity,  in  addition  to  tho  official  SKSC’  sul>joctivoly  dorivod  flaro  foroca.st, 
d'lius,  tho  information  required  for  objective  forecast  testing,  as  well  as  for 
comparison  with  tho  SESC!  forecast,  is  contained  in  the  same  base,  I  lares  are 
listed  according  to  their  peak  soft  (1-11  ;,)  \-ray  flux  at  1  Al  : 


CUass  C’: 

10'^’  <  E  < 

10“'’ 

watt 

_ ') 

m 

Class  Itl  I 

lO’’  <  K  < 

lO"*^ 

watt 

m 

class  X ■ 

watt 

_  ■) 

m 

f’rom  tho  standpoint  of  geophysical  environment  studies,  the  classes  and  X 
are  of  greatest  importance. 

In  addition  to  Table  1,  six  combination  parameters  (Table  2),  derived  from 
certain  original  parameters,  were  included  as  input  parameters.  'I'hese  six  were 


Table  2.  Combination  Parameters  (Numbers  in  right-hand 
column  refer  to  original  parameter  number  in  Table  1) 


fovind  to  have  possible  predictive  significance  in  the  earlier  study  where  twenty 

g 

such  combination  parameters  were  tested.  The  derivation  of  combination  param¬ 
eters  is  based  on  intuitions  about  the  form  in  which  predictive  information  might 
be  contained  in  the  data,  and  about  physical  quantities  (e.  g.,  energy  stored  in 
sheared  magnetic  fields)  presumed  relatable  to  flares.  The  subject  of  these  and 
other  combination  parameters  will  be  discussed  in  a  later  section. 

8.  Hirman,  J.  W.,  Neidig,  D.  F. ,  Seagraves,  P.  H.,  Flowers,  \V.  E. ,  and 

Wiborg,  P.  H.  (1980)  in  Sol. -Terrest,  Pred,  Proc.,  Vol.  3,  R.F.  Donnellj" 
(ed.),  C-64.  —  -  _ 


rUOCHK  KK 


'riir  ro^ion  aiialy.sis  [>aram ot er.s  for  today  are  independent  of  any  information 
on  Hare  activity  occurring  tomorrow;  therefore,  they  can  he  used  in  practice, 
today,  to  produce  a  flare  forecast  for  tomorrow,  assumin^^  that  predictive  infr)rma- 
tion  is  present  in  the  parameters.  We  have  used  the  first  N  records  (with  N  =  1500, 
as  t-iescrihed  lielow)  as  a  training  set  '  in  order  to  derive  the  classification  func- 
tions  for  tfiree  possihlo  outcomes:  "No  i-'lare,  ’  Mare,  "  and  M  or  X  l-']are.  ' 

M  and  \  Hares  '.vei'e  grouped  to^yether  as  a  single  class  in  order  to  reduce-  statisti¬ 
cal  noise  caused  by  the  relatively  few  cases  of  larger  flares.  5die  classification 
functions  were  then  applied  to  new  records,  using  only  the  input  parametei\-,  in 
order  to  produce  a  true  forecast,  'The  latter  procedure  was  accomplisheci  in 
steps  of  250  records  each,  with  the  training  set  sliding  fc-rward  in  tinie,  t!5U 
records  (ap|oroximatelv  one  month)  after  each  step,  'ffius,  mr  a  1500-record 
ti-aining  set,  the  remaining  -1  5 -record  test  set  requires  10  individual  su'oiest.-:> 

of  250  records  each  (except  for  tiie  nineteenth),  fhis  sliding  base  technique  main¬ 
tains  a  constant  \  records  in  the  training  set,  thereby  assuring  that  the  progran. 
is  trained  on  recent  data  relative  to  the  test  subset.  Tins,  combined  with  the 
relatively  small  size  of  the  test  subset,  mininiizes  the  effects  of  secular  trends, 
either  of  obsorvationai  or  solar  origin,  which  migfit  be  present  in  the  data. 

r.he  computei’  progugn-jj  was  trained  oii  the  X-ray  class  of  the  largest  event 
(No  I'dare,  i'  I'd  are,  or  M  \  \  b'lare)  occurring  in  the  region  in  the  24 -hour 
period  following  the  acquisition  date  of  the  input  parameters.  'J'hus,  the  computer 
forecast  is  exjmessed  in  terim^  of  prol^abilities  for  the  largest  event  to  be  in  one 
of  tiiese  classes.  The  outcomes  are  mutually  exclusive,  with  the  sum  of  probabili¬ 
ties  over  all  classes  equal  to  unity,  ’fhe  SESC  forecast,  however,  is  a  probability 
forecast  for  the  occurrence  of  each  class  of  event;  i.e,,  a  non-exclusive  formaU 
In  order  to  assess  the  quality  of  the  computer  forecast,  we  derived  a  comparison 
forecast  in  the  "exclusive"  format  hy  selecting  the  largest  event  class  in  the 
SESf  forecast  that  was  assigned  a  probability  greater  than  or  equal  to  0.  5,  Al¬ 
though  this  is  not  an  SESC  forecast,  it  is  probably  representative  of  what  would 
be  extant  if  the  SESC  chose  to  cast  their  predictions  in  this  mode. 

In  the  following  test  results  we  present  the  forecasts  according  to  both  the 
standard  multivariate  discriminant  analysis  (MVDA)  and  the  Cooley  and  l.ohnes 
procedure  (MVDA/CL).  There  are  important  differences  in  the  character  of 
these  two  forecasts,  which,  as  will  be  shown  later,  may  be  used  to  advantage. 


14 


TKST  KKSl  ITS 


4. 1  Kri4imiiiar\  DisnissitMi 

\\>  iiro  coacprned  mainly  with  tho  behavior  of  the  computer  forecast.^  i-ela- 
tive  to  the  comparison  forecast  when,  for  example,  changes  are  made  in  the 
si7(^  of  tlie  training  set,  choice  of  input  parameters,  solar  activity  levels,  and 
percent  of  missin^^  data.  In  all  cases  we  present  the  computer  forecast  alon^» 
with  tfie  comparison  forvcast  for  the  same  set  of  test  records.  Also  included  is 
a  list  t>f  input  paranieters  submitted  to  analysis,  alon^  witli  their  fre(mency  of 
self'ction  in  classifymc  the  thr(''e  outcomes,  .\ot(',  however,  tliat  ilue  ti’  t  fie  250- 
reconl  incr-eii'.ent  the  trminin^  sets  are  independent  of  each  oihfM'  only  when  sepa¬ 
rated  by  .-IX  oi'  more  sufisfns, 

A.-  a  first  -tf'p,  we  eliminated  11  parametei’s  which  wei'e  luU  selectevi  in  anv 
of  tfie  la  subs('‘!s,  I'ollowinc  tins,  tlie  program  was  run  auain  usinc  the  rcmain-- 
in^  input  parametei's.  The  results  are  eiven  in  I'ablr-s  5,  -4,  and  a,  I’his 

test  ( \)  will  .-ei've  a.'r  an  example  for  the  display.-  used  els<‘wher**  in  tin-  la'P'^'i’t. 

Table  -Iiow.-  tile  actual  matrix  of  I’ei^ion-day  forecast.-  v.-  r<  ci'^'n-o'ay 
!ar<^e-t  events,  fen'  tfie  tfiree  forecasts.  Table  5,  deriveci  I'rcnn  the  data  in  Table 
.'■1,  .-imimariae-.  tfie  following'; 

1’’  I'ercent  of  forecasts  correct  in  the  i:tiven  e\-cnt  das.- 
F"  1‘ercent  i>f  region-day  largest  events  which  were  forec:isted 
\  (V  K)/ J 

t’  riimatology  (percent  of  the  total  number  of  events  in  the  class) 

I  I  n  weighted  mean  of  the  A's  fen’  all  three -event  classes 
W  eighted  mean  femecast  accuracy  (the  sum  of  the  matrix 

(ii agonal  elements  divided  i)v  the  total  niii'nber  of  forecasts, 
er  events,  in  all  classes) 

(  U'f  1  Percent  of  forecasts  ifiat  are  one  matrix  element  away  from 
the  diagiMial 

t  d‘f  2  percent  of  foi’ecasts  that  are  two  matrix  elements  away  from 
the  diagonal 

These  varii^iis  scores  are  of  interest  because  of  the  several  ways  in  which 
forecasts  can  he  used,  h'or  example,  the  T'  scoi'e,  f>r  pei'centage  of  forecasts 
that  are  correct,  is  the  quantity  of  interest  to  a  customer  who  cannot  tolerate 
false  alarms,  A  quite  different  requirement  applies,  however,  in  a  situation 
where  surprise  flares  are  unwelcome.  In  the  latter  case,  the  K  score  is  the 
important  measurement.  Of  course,  knowing  the  customer's  need  in  advance 
allows  the  forecast  to  be  biased  either  toward  underprediction,  wfiich  tends  to 
improve  the  I*‘  score,  or  toward  overprediction,  which  impi’oves  the  K  score. 


ir> 


3.  Comparison  of  I‘'orocasts--Tost  A 
( 1 500-rpcord  training  sot) 


Largest  Event 

Largest  Event  Observed 

Total 

I'orecasted 

No  I’  lare 

C' 

M-iX 

Forecasts 

C'OMPAinSON 

No  Flare 

337G 

213 

25 

3614 

C' 

501 

199 

59 

7  59 

77 

82 

63 

2  22 

'fotal  Events 

3954 

494 

147 

4  595 

MVOA 

No  Flare 

3349 

190 

26 

3  565 

513 

20G 

51 

770 

92 

98 

70 

260 

Total  Events 

3954 

494 

147 

4ri9r) 

MVDA/CL 

No  Flare 

3739 

316 

50 

4105 

C 

185 

142 

50 

377 

M/^tX 

30 

36 

47 

113 

Total  Events 

3954 

4  94 

147 

4595 

As  a  moasure  of  tho  'balanced"  accuracy  of  a  forecast  in  a  given  event  class 
we,  therefore,  introduce  the  average  of  I-'  and  E,  given  by  A, 

The  accuracy  of  a  forecast  is  always  dependent  upon  the  climatology  for 
the  event  being  forecasted.  Higher  climatological  probabilities  tend  to  improve 
the  chances  for  predictions  to  be  correct.  For  example,  it  is  easy  to  predict 
"No  Flare"  with  90  percent  accuracy,  simply  because  no  flare  occurs  in  almost 
90  percent  of  all  active-region  days.  In  comparing  cumulative  scores  between 
forecasts  it  is  imperative  to  note  the  climatology  which  prevailed  during  the 
test  period.  Climatology  is  affected  by  a  number  of  factors,  including  event 
classification  criteria,  duration  of  forecast  interval,  and  level  of  solar  activity. 


in 


Table  4,  Parameters  Submitted  to  Analysis  and 
Their  Frequency  of  Selection  in  19  Subsets--Test  A 


Flares  Today 

19 

New  No.  1 

9 

Mag.  Pol. 

5 

Bright  Points 

19 

Mag.  Grad. 

9 

Neut.  F.  Chg. 

5 

New  No.  2 

17 

Mag.  Class 

6 

Spot  Class  3 

! 

Spot  Dynam. 

12 

Badio  !i/S 

n 

Spot  Class  2 

3  ! 

New  No,  a 

12 

Flare  Hist. 

G 

Spot  Inter. 

■■5  ! 

Proton  Hist. 

11 

N  e  vv  N  o . 

G 

Emerg.  i-'lux 

1 

1  ! 

Spot  Class  1 

9 

New  No.  4 

6 

j 

Table  f).  Comparison  of  [-'orecast  Scores ^-Ost  \ 


f'N^reoaster 

_ ^ _ 

Event 

F 

E 

A 

C 

I 

W 

(  'ff  1 

— =1 

(  'ff  2 

COMPAHISCN 

No  i'lare 

93.4 

85.4 

89.4 

86.  1 

2G.  2 

40.  3 

33.  3 

10.  8 

52.8 

79,  2 

18.  G 

2.  2 

MNX 

28.  4 

42.  9 

3  5.  G 

3,  2 

MVD  A 

No  {‘'hire 

03.  9 

34.7 

89.  3 

86,  1 

C 

26.8 

41.7 

34.  3 

10.  s 

53.  6 

78.  9 

18.4 

2,  6  i 

MN  \ 

2G,  9 

47.  G 

37.  3 

3.  2 

i 

t 

MVDA/C’I, 

No  l‘lare 

91.  1 

94,  6 

92.8 

8G.  1 

C 

37.7 

28.  7 

33.2 

10.8 

:54.  3 

85.  5 

12.8 

1.7 

M.<\ 

41.6 

32.  0 

36.  8 

3.  2 

1 

..  J 

17 


In  essence,  climatolog>^  is  directly  dependent  upon  "bin  si?:e.  "  I'ailure  to  stale 
climatological  conditions  clearly  (an  unfortunately  common  practice)  makes 

9 

intercomparison  of  forecasts  almost  impossible*  It  seems  that  this  point  can¬ 
not  be  emphasized  enough. 

liecause  "No  Flare"  constitutes  the  majority  of  situations  on  the  sun,  it  comes 
as  no  surprise  that  solar  flare  forecasts  are  usually  quite  accurate  overall;  i.e,, 
their  weighted  means  (\V)  are  high.  It  is  of  greater  interest,  however,  to  predict 
flares  than  quiet  conditions  and,  for  this  reason,  the  unweighted  score  1,  given 
simply  by  the  mean  of  the  A  scores  over  all  classes,  has  been  included  in  Table  5, 

Finally,  we  note  that  if  a  forecast  is  in  error,  it  is  better  to  be  wrong  by  one 
event  class  than  by  two.  Thus,  the  tendency  for  the  off-diagonal  entries  in  the 
matrix  to  cluster  near  the  diagonal  is  an  important  measure  when  comparing  fore¬ 
cast  scores  which  are  similar  otherwise.  Table  5  includes  a  measure  of  this 
error  distribution  in  the  form  of  the  Off  1  and  Off  2  scores. 

The  scores  (F,  E,  and  A)  have  uncertainties  of  approximately  ±1,  ±3,  and 
±5  for  No  Flare,  C  Flare,  and  M  &  X  Flare,  respectively.  The  V  and  \V  scores 
have  uncertainties  of  about  ±1,  Thus,  in  terms  of  A  and  F,  the  three  forecasts 
in  Table  5  are  essentially  identical.  The  MVDA/CL  forecast  definitely  excels  in 
the  W  score,  although  this  is  mainly  due  to  its  tendency  for  underprediction,  which 
places  a  large  number  of  forecasts  in  the  No  Flare  column.  The  tendency  for 
underprediction  in  the  MVDA/CL  forecast  is  evident  also  in  the  F  scores  for  C, 
and  M  &  X  flares,  being  significantly  higher  than  the  corresponding  E  scores. 

On  the  other  hand,  both  the  comparison  and  the  MVDA  forecast  are  biased  toward 
overprediction.  Their  overall  similarity  is  quite  striking, 

4.2  Effect  of  Training  Set  Size 

The  number  of  records  to  be  used  in  the  training  set  should  be  large  enough 
to  provide  sufficient  statistics  to  train  the  computer  program,  yet  small  enough 
to  avoid  the  effects  of  trends  in  the  data.  The  optimum  number,  while  not  known 
from  theory,  may  be  determined  empirically  by  varying  the  training  set  size  and 
comparing  the  scores  of  the  resulting  forecasts.  Table  6  shows  the  results  for 
training  sets  of  750  and  2095  records.  Together  with  Table  5  (1500-record  train¬ 
ing  set)  we  find  differences  of  only  small  significance,  A  close  examination  of 


9.  Simon,  P. ,  Smith,  J.  B, ,  Ding,  Y,,  Flowers,  W, ,  Guo,  Q, ,  Harvey, 

K.  L. ,  Hedeman,  R.,  Martin,  S.  F.,  McKenna  Lawlor,  S. ,  Lin,  V., 
Neidig,  D, ,  Obridko,  V.  N. ,  Dodson  Prince,  H, ,  Rust,  D, ,  Speich,  D, , 
Starr,  A,,  and  Stepanyan,  N,  N.  (1980)  in  Sol, -Terres.  Pred.  Proc., 
Vol.  2,  R.  F.  Donnelly  (ed.),  p.  287, 
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Table  6.  Comparison  of  Scores  Using  TnO-Hecord 


and  2095- 

Record  Training  Sets- 

-Test 

and  (■ 

Forecaster 

Event 

F 

E 

A 

1 

\\ 

Off  1 

off  2 

MVDA 

No  Flare 

93.  9 

86.0 

89.  9 

750  Records 

C 

28.4 

41.3 

34.  8 

53.  3 

80.0 

17.0 

3.0 

M  ^  X 

25.  2 

45.  1 

35.  1 

mvda/cl 

No  Flare 

91.  7 

93.  7 

92.  7 

750  Records 

C 

36.2 

32.7 

34,4 

52.8 

85.  1 

13.  1 

1.8 

M  &  X 

36.  3 

26,0 

31.  2 

MVDA 

No  Flare 

93.  9 

84.0 

89.0 

2095  Records 

C 

25,  9 

42.8 

34.4 

53.  6 

78.  4 

19.4 

2.  2 

M  X 

28.6 

46.4 

37.  5 

mvua/ci. 

No  Flare 

91.0 

95.0 

93.0 

2095  Records 

C 

38.0 

29.4 

33.  7 

54.3 

85.  8 

12.8 

1.4 

M  ^  X 

45,8 

26.4 

36.  1 

the  trend  in  the  various  scores^  however,  suggests  that  there  may  be  some 
improvement,  especially  in  the  MVDA/CL  forecast,  as  the  size  of  the  training 
set  is  increased  from  750  to  1500  records.  The  improvement  is  less  certain  in 
increasing  the  set  from  1500  to  2095.  According  to  motivations  which  will  be 
described  later,  the  E  score  is  of  interest  in  the  case  of  the  MVDA  forecast, 
while  the  F  score  is  of  prime  importance  for  MVDA/ CL,  Noting  these,  the  U 
scores,  and  the  fact  that  we  do  not  wish  to  make  the  training  set  unnecessarily 
large,  we  have  decided  to  use  1500  records  in  all  training  sets. 

4.3  Inclusion  of  Additional  Combination  Parameters 

Table  4  indicates  that  five  of  the  six  combination  parameters  from  Table  2 
were  retained  for  analysis  after  the  initial  parameter  selection,  Decause  several 
of  these  ranked  highly  in  frequency  of  selection  in  Test  A,  we  decided  to  test 
additional  combination  parameters.  As  in  the  case  of  the  original  six,  the  addi¬ 
tional  parameters  were  derived  on  the  basis  of  intuition.  Their  formulas  are 
given  in  Table  7, 

The  20  new  combination  parameters,  in  addition  to  the  20  parameters  used 
in  Test  A,  were  submitted  to  analysis  in  Test  D  (Tables  8  and  9),  It  is  convenient 
to  defer  the  discussion  of  the  latter  to  the  following  section. 


'fable  7,  Additional  Combination  Parameter.':*  (Numbers 
in  ri^iit-hand  column  refer  to  original  parameter  numbers 
in  Table  1) 


New  Parameter  No, 

Parameter  Formula 

Rates  of  Change 

7 

29  (today)  -  29  (yesterday) 

8 

37  -  37 

9 

(9-  10.  ll-  12)  -  (9-  10-  11-  12) 

10 

17  -  17 

11 

(12-  17-  27)  -  (12-  17-  27) 

12 

00 

CO 

1 

CO 

13 

9  -  9 

14 

(9.  10.  11)  -  (9.  10.  11) 

15 

15-15 

16 

12  -  12 

Parameters  Squared 

17 

^o 

CO 

18 

37^ 

19 

(9.  10-  11-  12)^ 

20 

17^ 

21 

9^ 

22 

(New  7)^ 

23 

(New  8)^ 

24 

(New  9)^ 

25 

(New  10)^ 

26 

(New  13)^ 

9.(\ 


Table  8,  Parameters  Submitted  to  Analysis  and  Their  I'requency 
of  Selection  in  19  Subsets--Tests  D,  E,  V,  G,  and  H 


No.  of 

Test  Parameters 


D 

40 

Flares  Today 

19 

Radio  B/S 

6 

Flare  Hist. 

2 

New  18 

17 

New  9 

6 

New  4 

2 

New  2 

16 

New  12 

6 

New  23 

2 

Bright  Pts. 

14 

New  1 

5 

Spot  Class  2 

1 

New  19 

12 

Neut,  L,  Chg, 

5 

Emerg.  Flux 

1 

New  15 

10 

New  5 

5 

New  17 

1 

Mag*  Grad, 

9 

New  14 

5 

New  19 

1 

Proton  Hist, 

9 

New  21 

5 

Spot  Class  1 

0 

New  3 

9 

New  22 

5 

New  11 

0 

New  8 

9 

Mag,  Pol, 

4 

New  13 

0 

New  20 

8 

New  7 

4 

New  16 

0 

Mag,  Class 

7 

Spot  Inter. 

3 

New  26 

0 

New  10 

7 

New  25 

3 

Spot  Class  3 

6 

New  20 

2 

A 

20 

See 

Table  4 

E 

15 

Flares  Today 

19 

Spot  Class  3 

14 

Hadio  B/S 

6 

Bright  Pts, 

19 

Spot  Dynam. 

13 

Spot  Class  1 

5 

Mag,  Class 

16 

Proton  Hist, 

11 

Mag.  Pol. 

5 

Mag,  Grad, 

16 

Flare  Hist. 

10 

Emerg,  Flux 

4 

Spot  Class  2 

14 

Neut.  L.  Chg. 

6 

Spot  Inter, 

3 

F 

8 

Flares  Today 

19 

Spot  Class  2 

16 

Spot  Dynam. 

11 

Bright  Pts, 

19 

Spot  Class  3 

14 

Spot  Class  1 

5 

Mag,  Class 

17 

Mag,  Grad, 

13 

G 

5 

Flares  Today 

19 

Mag,  Class 

18 

Spot  Class  3 

15 

Bright  Pts, 

19 

Spot  Class  2 

16 

3 

Flares  Today 

19 

Mag.  Class 

19 

Spot  Class  2 

19 

21 


Table  9.  Effects  of  Reduction  in  the  Number  of  Input  Parameters 


Forecaster 

Number  of 

Parameters 

V 

w 

Off  1 

Off  2 

R 

COMPARISON 

52.8 

79.2 

18.  6 

2.  2 

2,  22 

TEST  D 

MVDA 

40 

53.8 

79.  5 

18.4 

2.  1 

2.38 

MVDA/ CL 

54.6 

85.  1 

13.4 

1.  5 

0.73 

TEST  A 

MVDA 

20 

53.6 

78.  9 

18.4 

2.6 

2.63 

MVDA/ CL 

54.3 

85.  5 

12.8 

1.7 

0.60 

TEST  E 

MVDA 

15 

52.4 

78.  1 

18.  7 

3,  2 

2.80 

MVDA/CL 

53.9 

85.4 

12.8 

1.  8 

0.65 

TEST  F 

MVDA 

8 

52.2 

78.0 

18.6 

3.4 

2.  83 

MVDA/CL 

53.  3 

85.0 

13.4 

1.6 

0.71 

TEST  G 

MVDA 

5 

53.0 

77.6 

18,  5 

3.  9 

3.02 

MVDA/CL 

53. 7 

84.4 

14.0 

1.7 

0.83 

TEST  H 

MVDA 

3 

51.  5 

78.2 

17.  5 

4.  3 

2.72 

MVDA/CL 

53.  5 

85.2 

13.  3 

1.  5 

0.61 

4.4  Reduction  in  the  Number  of  Parameters 

The  computer  forecast  was  subjected  to  a 

series  of  reductions  (Tests 

E,  F, 

G,  and  H)  in  the  number  of  input  parameters,  according  to  Table  8,  with  the 
corresponding  forecast  results  summarized  in  Table  9.  Table  9  displays  the 
effects  of  parameter  reduction  beginning  with  40  parameters  and  ending  with 
only  three.  In  addition  to  the  previously  used  scores  we  introduce  R,  the  ratio 
of  the  number  of  matrix  entries  below  the  diagonal  to  the  number  above  the 
diagonal.  This  ratio  provides  a  measure  of  the  asymmetry  of  the  forecast,  with 
values  greater  than  unity  indicating  overprediction,  and  values  less  than  unity 
indicating  underprediction. 

Table  9  clearly  illustrates  that  the  reduction  in  the  number  of  parameters 
has  a  small  but  unfavorable  effect  on  the  computer  forecasts.  We  may  regard 
the  tendencies  for  R  to  depart  further  from  unity,  for  Off  2  to  increase,  and  for 
V  to  decline,  as  evidence  for  progressively  worsening  forecasts.  These  three 
effects  are  most  noticeable  in  the  MVDA  forecast,  while  the  latter  effect  alone  is 
marginally  evident  in  MVDA/ CL. 


The  effects  of  the  paranieter  reduction  are  offset  by  the  increase  in  the 
number  of  records  containing  all  or  most  of  the  parameters  submitted  for  anal¬ 
ysis  in  the  reduced  sets.  This  improvement  in  representation  occurs  because 
in  the  reduction  steps  we  usually  eliminated  those  parameters  that  were  least 
significant;  i.e.,  those  chosen  least  often  in  the  subsets  of  the  previous  test; 
and,  generally,  the  lower  the  significance  of  a  parameter,  the  more  often  it  is 
missing  from  the  data  base.  It  is  concluded,  therefore,  that  the  decline  in  fore¬ 
cast  quality  in  fable  SJ  would  have  been  more  pronounced  had  all  parameters 
been  present  in  all  records.  This  proves  that  there  is  valuable  predictive  infor¬ 
mation  contained  in  at  least  some  of  the  less  significant  parameters.  It  is 
emphasized  that,  perhaps  to  a  large  degree,  the  lower  significance  of  these 
parameters  is  due  only  to  their  frequent  absence  from  the  data  base. 

A  final  word  must  be  noted  regarding  the  combination  parameters.  Table  8 
indicates  that  a  number  of  these  new  parameters  have  been  selected  by  the  com¬ 
puter  program  as  significant  in  classifying  the  outcomes.  Due  to  the  complex 
intorcorrelations  among  various  parameters,  however,  in  addition  to  possible 
variance  stabilization  effects  and  other  statistical  phenomena,  we  do  not  fully 
understand  the  true  significance  of  these  comoination  parameters.  Questions 
such  as  this  probably  must  await  further  testing  on  data  bases  containing  fewer 
missing  parametei’s. 

1.7)  IVsts  on  a  Ktilly  Ke|irt'S4>iiled  Data  LUm' 

'fhe  most  important  test  of  the  computer  forecast  is  acineved  in  the  case 
v/hore  all  the  parameters  suimtitted  to  analysis  are  present  in  all  records  of  the 
data  base.  Such  a  test,  using  the  full  set  of  parameters,  is  impossible  with  the 
presently  available  data.  A  test  can  oe  made  on  a  fully  represented  base,  how¬ 
ever,  if,  for  example,  only  eight  parameters  are  used,  and  we  are  willing  to 
accept  a  reduced  base  of  .>7:^^  records,  of  which  only  223 li  remain  in  the  test  set. 
Such  a  test  (I)  was  performed,  and  the  results  are  shown  in  Tables  10,  11,  and  12. 

Test  I  shows  a  dramatic  improvement  in  the  AIVDA/CL  computer  forecast 
in  all  scores,  while  the  M\'UA  and  comparison  forecasts  show  smaller  improve¬ 
ments.  These  improvements  occur  despite  the  somewhat  lower  flare  climatology’ 
that  applies  to  this  particular  test  set.  The  fact  that  the  comparison  (subjec¬ 
tive)  forecast  scores  are  higher  indicates  that  the  more  complete  observational 
coverage  during  this  sample  of  records  somehow  benefits  the  subjective  methods 
also. 

Due  to  the  reduced  number  of  records,  the  errors  associated  with  the  Test  I 
scores  are  about  50  percent  higher  than  those  stated  earlier.  Nevertheless,  there 
now  seems  no  question  that  the  MVDA/CL  forecast  is  superior  to  the  others. 
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Table  10,  Comparison  of  l^’orecasts  I  sing  a  f  ully 
Hepresented  Data  iiase--Test  I  (laOO-record  training  set) 


Largest  Event 
Forecasted 

Largest 
No  Flar 

Event 
e  C 

Observed 

IV  I  ^  X 

'fot  al 

Fo recasts 

COMPARISON 

No  Flare 

17n4 

90 

8 

18  52 

C 

193 

70 

24 

287 

IVI  &  X 

28 

31 

34 

93 

Total  Events 

197  5 

191 

66 

2232 

MVDA 

No  Flare 

1707 

67 

10 

1784 

C 

232 

92 

24 

348 

Al  &  X 

36 

32 

32 

100 

Total  Events 

1975 

191 

66 

2232 

MVDA/ CL 

No  PTare 

1829 

97 

14 

1940 

C 

145 

82 

33 

260 

M  ^  X 

1 

12 

19 

32 

Total  Events 

1975 

191 

66 

2232 

Table  11,  Parameters  Submitted  to  Analysis  and 
Their  Frequency  of  Selection  in  9  Subsets --Test  I 


Flares  Today 

9 

Mag.  Class 

8 

Spot  Class  2 

5 

Bright  Pts. 

9 

Mag.  Grad. 

8 

Spot  Class  1 

1 

Spot  Class  3 

8 

Spot  Dynam, 

8 
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iiiMiifiiiijiiiiiilili 


( 'ojTiparison  nf  l-’oi-ocast  -  -  I'o.-,!  1 


1  'orecaster 

Event 

!• 

K 

A 

C 

1  A.  <  >ff  1 

(  n'i‘  2 

c:OMPAHlSON 

No  I'lare 

94,7 

88.  8 

91,8 

88.  4 

: 

C 

1^4.4 

;u;.  0 

80.  4 

8.  0 

4  4.-.  88.  2  I4.i 

1 .  G 

M  \  \ 

u>,  r; 

■)  1 .  4 

44.  1 

8.  0 

.\1\  DA 

Nc>  I'lare 

9.4,7 

88.  4 

91.  1 

88.  4 

( ’ 

Jfi.  4 

48.  8 

'P  .  .} 

8.  r; 

4fi.  2  82.0  i'-.O 

2,1 

M  A  A 

■A.  4 

40.  2 

0 

No  riar-- 

'  ^  4 .  ■ ; 

b'N  4 

98.  4 

88.  8 

C' 

.;l.  .) 

42.  9 

87,  2 

8.  G 

48.  8  8G.  4  12.9 

0.7 

M 

28.8 

44.  1 

8.0 

ruNCJ.l  >|()N>  \M)  HKCOMMKNDA  IIONS 


i  !]»■  ca)nclusio[ib  oi  thi.-^  study  may  t)p  .sumniarizpci  as  i'oilows: 
i.  i'hp  .staiKiard  .M\  i^.A  I'orocast  is  \  rrv  similar  to  the  comparison  lorccasl 
used  in  tills  stU(i\  in  terms  ot'  overall  accuracy  and  i)ias  toward  overprodiction. 

ria*  M\  DA/c'I..  forecast  is  superior  overall  to  eitlier  tfie  or  the 

coi!;par*son  K»reca.-t,  and  is  biased  toward  underprediction. 

The  optimum  sire  for  the  training  set  is  prol:)ably  about  lAOO  records 
for  the  climalvih  aios  tint  prevailed  during  1977  and  1978. 

4,  !■  lares  lA^dav  is  the  most  valuable  prediction  parameter  in  the  data 

base  useci  here,  with  tiie  Ilricht  Points'  parameter  a  verv  close  second. 

( >tiier  important  ])arameters  are  Magnetic  Class,  ”  ' Magnetic  Cradient,  Spot 
Cla.-^s,  and  Sun. pot  Dynamics. 

"i,  Com’oination  parameters,  althoutrli  their  role  is  not  fully  understood, 
seem  to  impress'  forecast  scores. 

9.  Some  of  the  often  missing  parameters  (which  probably,  tlierefore,  only 
appear  to  be  less  significant  as  predictors)  contain  valuable  predictive  informa¬ 
tion.  Probal)le  candidates  include  Radio  Burst/ Sweep,  Neutral  Pine  Changes, 
"Neutral  lane  C’omplexity,  '  and  "Emerging  I^^lux,  " 

’I'he  MVDA/CP  procedure  may  be  capable  of  producing  forecasts  superior 
to  any  presently  available  using  conventional,  subjective  techniques.  It  has  been 
shown  that  its  skill  becomes  markedly  evident  when  complete  parameter  repre- 
spntation  is  achievod  in  the  data  base.  On  the  basis  of  this,  we  predict  that  with 


improvements  in  data  consistency,  as  well  as  the  inclusion  of  new,  objective 
parameters  in  the  future,  the  computer  forecast  scores  will  continue  to  improve. 


This  study  has  led  us  to  make  the  following  recommendations  concerning 
the  use  of  the  two  computer  forecasts: 

1.  Provide  a  flare  forecast  derived  from  MVDA/CL  for  those  customers 
who  cannot  tolerate  false  flare  alarms  (note  the  comparison  of  F  scores  in 
Table  12). 

2,  Provide  a  flare  forecast  derived  from  standard  JVIVDA  for  those  cus¬ 
tomers  who  need  to  be  forewarned  of  flares  as  often  as  possible  (compare  E 
scores  in  Table  12). 

3.  Improve  the  coverage  for  the  parameters  in  Table  1  that  are  deemed 
'less  significant'  by  virtue  of  their  frequent  absence  in  the  data  base, 

4,  Improve  the  objectivity  and  consistency  of  all  parameters. 
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